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Abstract 

We consider an original problem that arises from the issue of security analysis of a power 
system and that we name optimal discovery with probabilistic expert advice. We address 
it with an algorithm based on the optimistic paradigm and the Good- Turing missing mass 
estimator. We show that this strategy uniformly attains the optimal discovery rate in a 
macroscopic limit sense, under some assumptions on the probabilistic experts. We also 
provide numerical experiments suggesting that this optimal behavior may still hold under 
weaker assumptions. 

Keywords: optimal discovery, probabilistic experts, optimistic algorithm. Good- Turing 
estimator, UCB 

1 Introduction 

In this paper we consider the following problem: Let X hea set, and A C X he a set of interesting 
elements in X. One can access X only through requests to a finite set of probabilistic experts. 
More precisely, when one makes a request to the i*'' expert, the latter draws independently at 
random a point from a fixed probability distribution Pi over X. One is interested in discovering 
rapidly as many elements of A as possible, by making sequential requests to the experts. 



1.1 Motivation 

The original motivation for this problem arises from the issue of real-time security analysis of a 
power system. This problem often amounts to identifying in a set of credible contingencies those 
that may indeed endanger the security of the power system and perhaps lead to a system collapse 
with catastrophic consequences (e.g., an entire region, country may be without electrical power 
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for hours). Once those dangerous contingencies have been identified, the system operators 
usually take preventive actions so as to ensure that they could mitigate their effect on the 
system in the likelihood they would occur. Note that usually, the dangerous contingencies are 
very rare with respect to the non dangerous ones. A straightforward approach for tackling 
this security analysis problem is to simulate the power system dynamics for every credible 
contingency so as to identify those that are indeed dangerous. Unfortunately, when the set 
of credible contingencies contains a large number of elements (say, there are more than 10^ 
credible contingencies) such an approach may not possible anymore since the computational 
resources required to simulate every contingency may excess those that are usually available 
during the few (tens of) minutes available for the real-time security analysis. One is therefore left 
with the problem of identifying within this short time-frame a maximum number of dangerous 
contingencies rather than all of them. The approach proposed in |FB1H iFBED^To] addresses 
this problem by building first very rapidly what could be described as a probability distribution 
P over the set of credible contingencies that points with significant probability to contingencies 
which are dangerous. Afterwards, this probability distribution is used to draw the contingencies 
to be analyzed through simulations. When the computational resources are exhausted, the 
approach outputs the contingencies found to be dangerous. One of the main shortcoming 
of this approach is that usually P points only with a significant probability to a few of the 
dangerous contingencies and not all of them. This in turn makes this probability distribution 
not more likely to generate after a few draws new dangerous contingencies than for example a 
uniform one. The dangerous contingencies to which P points to with a significant probability 
depend however strongly on the set of (sometimes arbitrary) engineering choices that have been 
made for building it. One possible strategy to ensure that more dangerous contingencies can 
be identified within a limited budget of draws would therefore be to consider K > 1 sets of 
engineering choices to build K different probability distributions Pi, P2, . . ., Pr and to draw 
the contingencies from these K distributions rather than only from a single one. This strategy 
raises however an important question to which this paper tries to answer: how should the 
distributions be selected for being able to generate with a given number of draws a maximum 
number of dangerous contingencies? We consider the specific case where the contingencies are 
sequentially drawn and where the distribution selected for generating a contingency at one 
instant can be based on the past distributions that have been selected, the contingencies that 
have been already drawn and the results of the security analyses (dangerous/non dangerous) 
for these contingencies. This corresponds exactly to the optimal discovery problem with expert 
advice described above. We believe that this framework has many other possible applications, 
such as for example web-based content access. 

1.2 Setting and notation 

In this paper we restrict our attention to finite or countably infinite sets X. We denote by 
K the number of experts. For each i E {1,...,/^}, we assume that {Xi^n)n>i are random 
variables with distribution Pi such that the {Xi^n)i,n are independent. Sequential discovery 
with probabilistic expert advice can be described as follows: at each time step t G N*, one picks 
an index It G {1, ... , K}, and one observes Xj^^m^ t-> where 



The goal is to choose the {It)t>i so as to observe as many elements of A as possible in a fixed 
horizon t, or equivalently to observe all the elements of A within as few time steps as possible. 
The index /j+i may be chosen according to past observations: it is a (possibly randomized) 




s<t 
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function of {Ii,Xij-^, . . . ,It,Xj^^ni J- We are mainly interested in the number of interesting 
items found by the strategy after t time steps: 

^ 1 < a; G {Xi^i, . . . , J , . . . , Xk,i, ■■■ , XK,nK,t } \ ■ 

Note in particular that it of no interest to observe twice the same same element of A. 

While Algorithm Good-UCB, presented in Section [21 can be used in a more general setting 
(as illustrated in Section [6|) , for the mathematical analysis we restrict our attention to the case 
of probabilistic experts with the following properties: 

(i) non-intersecting supports: A n supp(Pj) n supp(Pj) = for i 7^ j, 

(ii) finite supports with the same cardinality: | supp(Pj)| = A^, Vi G {1, . . . ,K}, 

(iii) uniform distributions: Pi{x) = ■^,Vx G supp(Pj),Vi G {1, . . . ,K}. 

These asumptions are made in order to be able to compare the performance of the Good- 
UCB algorithm to an "oracle", described below. Indeed, in that case, this oracle has a very 
simple behavior. In this setting it is convenient to reparametrize slightly the problem (in 
particular we make explicit the dependency on for reasons that will appear later). Let 
= {1, . . . , a:} X {1, . . . , A^}, A^ C the set of interesting items of , and = \A^\ 
the number of interesting items. We assume that, for expert i G {1, . . . , A}, P^^ is the uniform 
distribution on {i} x{l, . . . , A}. We also denote by Qf = \ A^ n {{i} x {1, . . . , A}) | the number 
of interesting items accessible through requests to expert i. Further notation is given in Section 

13 

1.3 Contribution and content of the paper 

This paper contains the description of a generic algorithm for the optimal discovery problem 
with probabilistic expert advice, and a theoretical proof of optimality in a particular setting. 
In Section [21 we first depict our strategy, termed Good-UCB. This algorithm relies on the 
optimistic paradigm (which led to the UCB (Upper Confidence Bound) algorithm for multi- 
armed bandits, |ACBF02] ). and on a finite-time analysis of the Good- Turing estimator for the 
missing mass. In order to analyze and quantify the performance of this strategy, we compare it 
with the oracle (closed-loop) policy, a virtual algorithm that would be aware, at each time, of the 
probability of each item under each distribution, and would thus be able to sample optimally. 
This strategy is carefully described and analyzed in Section [3 The analysis is performed under 
the non-intersecting and uniform draws assumptions [(i), (ii), (iii)] described above, and in a 
macroscopic limit sense, that is when the size of the set X grows to infinity while maintaining a 
constant proportion of interesting items. More precisely we prove the following theorem, where 
F'^ (t) is the number of interesting items found by the oracle policy after t time steps. 

Theorem 1. Assume that, for all i G {1, . . . , AT}, gf /A converges to qi g]0, 1[ as A goes to 
infinity. Then, almost surely, the sequence of mappings 1 1— )• ([At]) /A converges uniformly 
on M4. to a limit denoted F as N goes to infinity. 

In Section [3 we also give an explicit expression for the limit F. Section [5] presents a study 
of the oracle open-loop policy which is defined as the optimal fixed allocation. In this prob- 
lem it turns out that the oracle open-loop policy achieves the same performance as the oracle 
closed-loop policy, which in turns yields another formula for the macroscopic discovery rate 
F. In particular these formulas allow to see easily the difference in the macroscopic behavior 
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between optimal policies, and suboptimal policies such as uniform requests, see Remark [T] for 
more details. 



The main result of the paper is given in Section [H We show that Good-UCB is a macroscop- 
ically optimal policy, that is, the performances of Good-UCB tends to the performances of the 
oracle policy. More precisely let (t) be the number of interesting items found by Good-UCB 
after t time steps. 

Theorem 2. Assume that, for alii £ {1,..., K}, /N converges to qi €]0, 1[ as N goes to 
infinity. Then, almost surely, the sequence of mappings t i— t- {[Nt]) /N converges uniformly 
on to the limiting proportion F found during the same time by the oracle policy. 

Section [6] reports experimental results that show that the Good-UCB algorithm performs 
very well, even in a setting where assumptions (i), (ii) and (iii) are not satisfied anymore. 
Finally, Section [7] concludes. 



2 The Good-UCB algorithm 

We describe here the Good-UCB strategy. This algorithm is a sequential method estimating at 
time t, for each expert i £ {1, . . . , K}, the total probability of the interesting items that remain 
to be discovered through requests to expert i. This estimation is done by adapting the so-called 
Good- Turing estimator for the missing mass. Then, instead of simply using the distribution 
with highest estimated missing mass, which proves hazardous, we make use of the optimistic 
paradigm (see Agr95 IACBF02] and references therein) , a heuristic principle well-known in 



reinforcement learning, which entails to prefer using an upper- confidence bound (UCB) of the 
missing mass instead. At a given time step, the Good-UCB algorithm simply makes a request 
to the expert with highest upper-confidence bound on the missing mass at this time step. We 
start with the Good- Turing estimator and a brief study of its concentration properties. Then 
we describe precisely the Good-UCB strategy. 



2.1 Estimating the missing mass 

Our algorithm relies on an estimation at each step of the probability of obtaining a new inter- 
esting item by making a request to a given expert. A similar issue was addressed by I. Good 
and A. Turing as part of their efforts to crack German ciphers for the Enigma machine during 
World War II. In this subsection, we describe a version of the Good- Turing estimator adapted 
to our problem. Let O be a discrete set, and let A be a subset of interesting elements of ft. 
Assume that Xi , . . . , X„ are elements of O drawn independently under the same distribution 
P, and define for every x £ Q: 

n 

On{x) = J2 H^rn = x}, Zn{x) = l{On{x) = 0}, Un{x) = l{On{x) = 1} . 
m=l 

Let Pmax = max{P(x) : x G $7}, let = Ylx&A'^ri{x)P{x) denote the missing mass of the 
interesting items, and let Un = X]a;eA^"(-^) number of elements of A that have been 
seen exactly once (in linguistics, they are often called appaxes). The idea of the Good- Turing 
estimator ( |Goo53j . see also |MS001 [DSZ03j and references therein) is to estimate the (random) 
"missing mass" which is the total probability of all the interesting items that do not occur 
in the sample Xi, . . . by the "fraction of appaxes" Rn = Un/n. This estimator is well- 
known in linguistics, for instance in order to estimate the number of words in some language. 
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see jGS95j . For our particular needs, we derive (using similar techniques as in |MS00j ) the 
following upper-bound on the estimation error: 

Proposition 1. With probability at least 1 — S, 

Rr 



1 _ ^/ (2/n + pn,ax)^nlog(2/(5) ^ ^ ^ + p^^^fn\og{2/5) 

Proof: The random variable Wn = Rn — Rn is a function of the independent observations 
Xi,. . . , Xn such that, denoting Wn = f{Xi, . . . , X^), modifying just one observation has limited 
impact: VZ G {1, . . . , n}, V(xi, . . . , Xn, x'^) G Q,^^^, 

2 

/(Xi, . . . ,X„) - f{xi, . . . ,Xl-i,Xi,Xl+i, ...,Xn)\ < - +Pmax 

By applying McDiarmid's inequality |McD89j . one gets that, with probability at least 1 — 5, 



\Wn-E[Wn]\ < 



Moreover, 



nwn] = 



{2/n + pra..)^nlog{2/S) 



1 



P{x) (1 - - - X nP{x) (1 - P{x)) 



n-l 



n 



P{x) X nP{x) (1 - P{x)f'^ 



1e 

n 



.x&A 





1 




— ,0 




n 



which concludes the proof. 

2.2 The Good-UCB algorithm 

Following the example of the well-known Upper-Confidence Bound procedure for multi-armed 
bandit problems, we propose Algorithm[Tl which we call Good-UCB in reference to the estimated 
procedure it relies on. For every arm i S {!,..., K} and for every t E N, denote 

For each arm i G {1, . 



= x}, Ot{x) = Y.f=i Oi,t{x), Ui^x) = l{Oi,t{x) = Ot{x) = 1}, 
, K} , the index at time t is composed of the estimate 

Ri,t-i 



nit-i 



of the missing mass 



inflated by a confidence bonus of order ^J\og{t) /rii^t-i- Good-UCB relies on a tuning parameter 
c which is discussed below. 

Note that the Good-UCB algorithm is designed for more general probabilistic experts than 
those satisfying assumptions [(i), (ii), (iii)]. In particular since we do not make the non- 
intersecting supports assumption (i), the missing mass of a given expert i depends explicitly on 
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Algorithm 1 Good-UCB 



1: For l<t<K choose It = t. 
2: for t > K + 1 do 



3: Choose It = arg maxi<j<^ ^Rit-i + 

4: Observe Xt distributed as P/j and update Ri^t, ■ ■ ■ , RK,t accordingly 
5: end for 



the outcomes of all requests (and not only requests to expert i). Note also that the bounds of 
Proposition [T] hold for all discrete distributions. The experiments of Section [S] validate these 
observations, and show that Good-UCB behaves very well even when assumptions [(i), (ii), (iii)] 
are not met. However, for the theoretical analysis of our algorithm, we focus on large values of 
N under the non-intersecting and uniform draws assumptions [(i), (ii), (iii)]: indeed in that case 
the performance of the oracle strategy is simple and deterministic, so that the optimality of the 
Good-UCB algorithm can be analyzed. More precisely, we will show that, in the macroscopic 
limit, the number of items found at each time by Good-UCB converges to the number of items 
found by the closed- loop oracle strategy that knows the number of interesting items to find with 
each expert, at every time, and that may use this information to make its choice. In order to 
prove this, we first analyze the performance of such an oracle strategy. 



3 The closed-loop oracle strategy 

Prom now on we restrict our attention to the setting described in Section [L2] with sets , , 
and experts distributions , . . . , . Denote by Bf the set of interesting items supported 
by Pl^: = {x G {!,..., TV} : {i,x) S A^]. Let Qf = \Bf\; in particular, note that 

= gf H ^Qk- Without loss of generality, we will assume in the analysis that > 

Q2 > • • • > Qk- Successive draws of expert i are denoted (i, X/^), (i, X^^), . . . , where the 
variables (Xj^)j^„ are assumed to be independent. We denote by (-C'^)^<^<;qjv the increasing 
sequence of the indices corresponding to draws for which new interesting items are discovered 
with expert i: 

D^, = min {n > 1 : X.^ G i?f } , D^, = min {n > D^, : Xf„ G i?f \ {^f^. }} , • • ■ 



Reciprocally, we denote = max{/c G N : < n} the number of items found in the first 

n draws. We also defin 

S^, (1 < i < K,k > 1) 
particular, for all A; > 1, 



n draws. We also define S^q = and for k > 1, 5"/^ = D^j, — Dff^_^. The random variables 
•Sffc ^ i ^ K^k > 1) are independent with geometric distribution ^((1 -|- Qf — k)/N). In 



3.1 Description of the closed- loop oracle policy 

When the values of Qi, ■ ■ ■ are known, so that the number of interesting items to find 
with each expert is known at every step, an horizon-free optimal closed-loop strategy (denoted 
in the following as the "oracle closed-loop strategy" or as OCL) consists in making a request, 
at each time step, to one of the experts with highest number of still undiscovered interesting 
items. Hence, an OCL strategy can: 

• first request expert 1 for Df steps; 
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• then, alternatively request 

— expert 1 for ^ii^^qn^qn steps; 

— expert 2 for S21 steps; 

— expert 1 for Sj^g+gf-Q^ steps; 

— expert 2 for S22 steps; 

— and so on, until there are only Q^^ undiscovered interesting items on experts 1 and 
2. 

• and so on, including successively experts 3, 4, . . . , in the alternance. 

For every / S {0, . . . , Qi}, we shall be particularly interested in the waiting time T^{1) until, 
under the OCL strategy described above, all experts have at most I undiscovered interesting 
items. Obviously, 

SO that, by Equation ([T|), 

i:Qf>l 

At that time, the number of items discovered so far is (l) = X^^Li {Qf^ ~ + - ^'^^ every 
/ E {0, . . . , Q^}, we define (f) to be the maximal number of undiscovered items remaining 
on an expert, once that / interesting items have been discovered altogether. In particular, note 
that L^{G^{1)) = I for aU /, and that G^(L^(/)) G {/ - if + 1, ...,/} for ah /. Besides, the 
time {f) required by the OCL strategy to collect / interesting items satisfies 

[L\f)) < r^if) < {L\f) - 1) . (3) 

The performance of the OCL strategy is maybe more explicitely expressed by the pseudo- 
inverse of the mapping of : for every integer t let (t) denote the total number of items 
found up to time t. and are related as follows: 

Vi > 0, = max{/ G {1, . . . , Q^} : t^(/) < t}, 

V/ G {0, . . . , Q^}, T^(/) = min{t > : F^(t) = /} . (4) 

3.2 Macroscopic limit 

We consider a macroscopic limit where goes to infinity together with the Qf in such a way 
that /N — qi g]0, 1[. Let q = qi + - ■ ■ +qK- We will show that if is a sequence of integers 
such that f^/N — )• (j) g]0, g[, then the normalized waiting time {f^) /N converges as N goes 
to infinity to a deterministic limit t((/)) that we will compute as a function of (f) and qi, . . . ,qK- 
We start with some notation. Let G : [0, qi] — )■ [0, q] be the strictly decreasing mapping defined 
by 

K 

G(A) = - A)+ , 

i=l 

and let L : [0, q] — )• [0, qi] be the inverse mapping. The following lemmas are proved in the 
appendix. The second lemma is the key step of the macroscopic analysis. 



7 



Lemma 1. The mappings G and L are the limits of the sequences of mappings {G^)n and 
{L^)n, respectively, in the following sense: 

(i) If l^ € [0, Q^] defines a sequence of integers, then l^ /N — ?• A € [0, (71] if and only if 
G^{l^)/N G{\) as N goes to infinity. 

(a) If f^ € [0, Q^] defines a sequence of integers, then f^ (N — t- € [0, 9] if and only if 
L^{f^)/N L{4>) as N goes to infinity. 



Lemma 2. For every X £]0,qi], let 



r(A) = ^ log 

i:gi>X 



For every sequence {1^)n such that l^/N converges to X as N goes to infinity, T^{l^)/N 
converges almost surely to T{X) as N goes to infinity. 

Theorem 3. For every sequence {f^)N such that f^ G {0, . . . and f^ /N converges to 

(j) as N goes to infinity, {f^)/N converges a.s. to T{(p) = T{L{(j))) as N goes to infinity. 

Proof: By Equation ([3]), 

(L^(/^)) r^(/^) {L^if^) - 1) 

N - N ^ N ■ 

By LemmaH limAr_,oo L^{f^)/N = limN^oo{L^ (f^) -l)/N = L{(j)) and thus, by LemmalU 

(L^(/A^)) ^L^ifN^ - 1) 
JToo N = ^^Too N = • 

Corollary 1. For every sequence sequence of integers such that t^ /N t as N goes to 
infinity, F^ {t^) converges to F{t), where the function F : M — )• [0, (/[ is t^^ = G o T^^, i.e. 

K 



F{t)=Y,{^^-T-\t))_ 



i=l 

The proof, very similar to the previous proofs, is omitted. Another expression for F is 
obtained in Section [S] in the analysis of the open-loop oracle policy. This allows us to finish the 
proof of Theorem [1] 

Proof of Theorem [T] : The fact that, almost surely, the sequence of increasing processes 
1 1— 7- F^{[Nt])/N uniformly converges to F on every compact of is a consequence of Corol- 
lary [T] and Dini's Theorem. This is sufficient, since F is upper-bounded by q. 



4 Macroscopic optimality of the Good-UCB algorithm 

After n requests to expert « G {1, . . . ,K}, denote by Rf^^ = {Qf — F^^ {n))/N the proportion 
of interesting items not yet found with that expert. To estimate this number, we use the 
Good- Turing estimator 

f^N E.^B^ nEl=l HXi,m = X} = 1} 



8 



In particular note that under assumption (i), the estimator Ri^t defined in Section [2.21 satisfies 
Ri,t = Rimt- simplify the proof, we use here a slightly different confidence bonus than the 
one proposed on line 3 of the algorith. We define here the following upper confidence bound: 



N _ r>- I 



N {2/n + l/Nfn\og{2KN^) 



2 

According to Proposition [H the event such that: 



« . (1, . . . . A-}, V„ . (1. A^^O. - .^' P/"+V^)-^n.o.(2A-iV^) _ i < , , 

has probability at least 1 — iV~^. 

The Good- Turing optimistic strategy consists in making a request, at each step, to the 
expert maximizing the upper-confidence bound uf^_ ^, where rii^t denotes the (random) number 

of requests to expert i before time t. Denote by f^(/) the time required by this algorithm to 
collect / interesting items, and by (t) the number of interesting items found by Good-UCB 
in the first t rounds. 

Theorem 4. In the macroscopic limit, {f^)/N converges a.s. to the same limit t{(I)) as for 
the oracle closed-loop policy when /N tends to (f). Moreover, {t^)/N converges a.s. to the 
limiting proportion F{t) found during the same time by the OCL policy when t^ /N converges 
to t. 

As for Theorem [H this results directly leads to Theorem [2j To prove Theorem U we 
proceed as for Theorem O we first consider for every e > and for every / G {(-N, . . . , Qf^} 
the number T^{1) of steps until, using the Good-UCB algorithm, at most / interesting items 
remain undiscovered on all experts. Obviously, on , the number (l) is upper-bounded by 
the first time 

C/^(/) = inf |t > : Vi E {1, . . . , /^}, < ^} 

when all the upper-bounds of the missing masses are below l/N. Write 

^^(0 = t/f(0 + --- + c^^(0, 

where Uf{l) = n^^^u'^ii) denotes the number of requests to expert i G {1 . . . ,K} up to time 
U^{1). Four cases can be distinguished: 

• either the draw does not belong to C^, which has probability 0{N~'^); 

• or (l) > iV^, which is easily shown to have probability 0(A^"^); in fact, after 6A''log(A^) 
requests to an expert i the probability that there exists any item {i,x) E which has 

not been drawn at least twice is 0{N~'^), and otherwise = O (^^J\og{N) /N^ < e 

for N large enough; 

• or U.^{1) < Af2/3. 

• or, as iV^/^ < Ul^{l) < U^{1) < N'^, and according to Equation O 



(2/C//V(/) + l/NfUl'il) log(2A'iV4) 1 



for some positive absolute constant C. Now, remark that 
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^fc/^(/)-i ^ because the — l)-th request to expert i took place before time 

U^{1), and at that time expert i had the highest index; 

^fc/^(0 — ^fc/^(0-i ~ C'N~'^/'^ for some absolute constant C , as n > N'^^^, uf^ — 
Ui ,n-i > -V^ and 

V^(2/n + l/iV)2nlog(2A'Af4)/2_^(2/(n- 1) + l/iV)2(n - 1) log(2A'iV4)/2 > -1/n 
because 



V^(2/n + l/iV)2nlog(2E:iV4)/2 ^ > 1 ^ 



^(2/(n - 1) + l/iV)2(n - l)log(2KiV4)/2 V + 1 n 
and ^(2/(n - 1) + l/iV)2(n - 1) log(2A'iV4)/2 < 1 for n > KN'^/^. 

Hence ^f^iv^;) > / - A^/iV, with < C"N^'^ \og{KN) and C" = C7 + C'. As {i) = 
{Qf — F^^)/N for every positive integer k , this implies that Uf^ {I) < -Of^iv_;_^jv 

Altogether, we obtain that on C^: 

U^{1) < KN^I^ + Kq^-I-^^ = + ^"^(^ - ^'^) ■ 

Thus, if is a sequence of integers of [eA^, Q^] such that /N — )• A G [e, gi] as A^ goes to 
infinity, then 

limsup ^ — ^< lim ^ -=T(X) 

N^oo N - N^oo N 

according to LemmaO almost surely, using the Borel-Cantelli lemma and the fact that P{C^U 
{U^{1) > N^}) < oo. This is sufficient to show that f{l^)/N converges a.s. to r(A) as A^ goes 
to infinity. As for Corollary [H this implies that F^{t^)/N converges a.s. to F{t) when /N 
converges to t. 



5 The open-loop oracle policy 

An open-loop policy must choose, for each horizon t, the respective numbers of requests (ra^, . . . , n^) 
for each distribution (so that + • • • + = t^) in advance. It appears here that, in the 
limit, the oracle open-loop (OOL) policy, which makes use of the parameters {Qi, ■ ■ ■ ,Q^), is 
as good as the OCL policy. 

Recall that RJ^ jv = {Qj^ — {''^f)) is the proportion of interesting items not yet found 

with expert i after nf requests. Suppose that /N — )• t, and that nf /N vi as N goes to 
infinity; it is easily shown that almost surely 



lim N = lim E 

N^oo TV-s-oo 



-ft- jv 



J™ ~ KT^^ — = 9« exp(-fi 



Hence, the proportion of interesting items found with the allocation . . . , n^) almost surely 



converges to 'Yl!i=i Qi (1 ~ exp(— i/j)). Defining 



K 
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it follows that finding the best macroscopic allocation reduces to the following constrained 
convex minimization problem: 

min r(z^) such that + ■ ■ ■ + vk = i and Vi, > . 
The solution r*{t), reached aX u = i^*{t), is easily derived by classical optimization techniques: 



Proposition 2. For every i G {1, . . . , K} , let q. = exp\^l/i x X]^_iloggfc) denotes the geo- 
metric mean of qi, . . . ,qi. 

1. There exists I{t) G {!,..., such that 

'Vi</(t), z.*(t) = ^ + log^ 
Vi>/(t), y*{t) = ^. 



Hence, 

^ ^ ' ^ i>i(t) 



2. There exists 1 = ti < ■ ■ ■ < tx < +oo such that 

Vt G [t^,ti+l[, I{t)=i 

The {tk)k o,re such that 



qi + {i- l)q._^ exp ("t^) = i% exp (^-|^ 



For instance, ti = \og[qi/q2)- 
Proof: Introduce the Lagrangian: 



K / K \ K 

L{ui, . . . A,^i, . . .,hk) = ^g^exp (-^) + ^ f X]^* ) " 

i=l \i=l J i=l 

We need to find the solution of: 

Vi G {1, . . . , M}, - qi exp (-fj) + A - = 
K 

i=l 

Vi G {1, . . . , M}, /ijfj = and /ij > 

We first obtain that 

Vi = logqi - log(A - ^h) 
Denoting A = {i : Vi > 0}, and using that i £ A =^ = we get: 

t = ^log(g,)- |^|log(A) 

from which we get 

-iog(A)=^-^j:iogg., 
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and then for all i £ A: 



= logqi + - log Qi . 



\A\ \A\ 



Next, observe that = <;=^ qi > A: in fact, if z/j = then the first equation gives 
—Qi + A — /ij = 0, and < fii = X — qi. Conversely, if z/j > then /ij = and Ui = log(gi/A) > 
implies qi > A. Thus, there exists I{t) such that A = {1, . . . ,I{t)}, and for all i < I{t), 

Ui = log h 



m 



Moreover, 



r*{t) = r(i/i,...,z/j(t),0, ...,0) 



i<I{t) 



The instants {ti)i<i<K are such that 



log h 



i>I(t) 



m 



^ k>i-l 



+ E 



= iq. exp 



- ) + E ' 

^ k>i 



which is equivalent to 



qi + ii- l)q-_-^ exp 



I — \ I 



For i = 2, this gives 



= q2 + qiexp{-i^2) - 2^9192 exp (^-yj = - Vft exp (-1^2) 

which leads to ti = log{qi/q2)- 

Theorem 5. The proportion of items found by the open-loop oracle policy uniformly converges 
to F in the macroscopic limit. 

The proportion of interesting items found by the OOL policy is 

K 



= E 

i<I{t) 



t 



m 



M=^(q,-A(t))^ , 



where A(t) = q^^^^ exp y—jjj^j G [0, qi(t)]- To conclude, it remains only to remark that A = T" 
in fact, if A is such that qi^+i < X < qig, then /(T(A)) = zq and 



-1 . 



A(T(A)) = a. exp 



r(A) 



exp 



gft exp 



^0 



If A < qx, the same holds with iQ = K. 
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Remark 1. Proposition\^ permits to compare easily the macroscopic performance of the Good- 
UCB algorithm with balanced sampling: when all distributions are sampled equally often, the 
proportion of unseen interesting items at time t is X^^^i 9i exp(— t/iC) = KqKe^p{—t/K), 
with qx = (Yld=i^i) / ^ ■ On the other side, for the Good-UCB algorithm, this proportion is 
I{t)qji^^^exp {—t/I{t)). The ratio is a decreasing function of time, but even when t > tx, it 
takes the nice expression (Jk/Qj^ ^ 1; the ratio of an arithmetic mean with a geometric mean, 
which is (as expected) as high as the {qi)i are unbalanced. 

6 Simulations 

We provide a few simulations illustrating the behaviour of the Good-UCB algorithm in practice. 
In order to illustrate the convergence properties shown in Sections [3] and [U we first consider 
an example with K = 7 different sampling distributions satisfying assumptions [(i),(ii),(iii)], 
with respective proportions of interesting items qi = 51.2%, q2 = 25.6%, 53 = 12.8%, 54 = 
6.4%, ^5 = 3.2%, 56 = 1-6% and qj = 0.8%. Figure [1] displays the number of items found as a 
function of time by the Good-UCB (solid), the oracle (dashed) and a balanced sampling scheme 
simply alternating between experts (dotted). The results are presented for sizes = 128, = 
500, N = 1000 and = 10000, each time for one representative run (averaging over different 
runs removes the interesting variability of the process). The convergences proved in Corollary [T] 
and Theorem m are obvious. Moreover, it can be seen that, even for very moderate values of 
A'^, the Good-UCB significantly outperforms uniform sampling even if it is clearly distanced by 
the oracle. 

For these simulations, the parameter c of Algorithm Good-UCB has been taken equal to 
1/2, which is a rather conservative choice. In fact, it appears that during all rounds of all runs, 
all upper-confidence bounds did contain the actual missing mass. Of course, a bolder choice of 
c can only improve the performance of the algorithm, as long as the confidence level remains 
sufficient. 

In order to illustrate the efficiency of the Good-UCB algorithm in a more difficult setting, 
which does not satisfy any of the assumptions (i), (ii) and (iii), we also considered the following 
(artificial) example: K = 5 probabilistic experts draw independent sequences of geometrically 
distributed random variables, with expectations 100, 300, 500, 700 and 900 respectively. The 
set of interesting items is the set of prime numbers. We compare the oracle policy, Good-UCB 
and uniform sampling. The results are displayed in Figure El Even if the difference remains 
significant between Good-UCB and the oracle, the former still performs significantly better 
that uniform sampling during the entire discovery process. In this example, choosing a smaller 
parameter c seems to be preferable; this is due to the fact that the proportion of interesting items 
on each arm is low; in that case, one can show by using tighter concentration inequalities that the 
concentration of the Good- Turing estimator is actually better than suggested by Proposition [TJ 
In fact, this experiment suggests that the value of c should be chosen smaller when the remaining 
missing mass is small. 

7 Conclusions 

This paper introduced an original problem, optimal discovery with probabilistic expert advice. 
We proposed an algorithm to solve this problem, and showed both analytically and through 
simulations its efficiency. 

This work can be extended along several directions. First, it would be interesting to analyze 
the behaviour of Good-UCB under less restrictive assumptions on the experts. Note that 
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Figure 1: Number of items found by Good-UCB (solid), the oracle (dashed), and uniform 
sampling (dotted) as a function of time for sizes = 128, = 500, N = 1000 and = 10000 
in a 7-experts setting. 




Figure 2: Number of prime numbers found 
and uniform sampling (dotted) ctS Si function 
100,300,500,700 and 900, for c = 0.1 (left) and 

1000 1 • • • • 1 



800 




by Good-UCB (solid), the oracle (dashed), 
of time, using geometric experts with means 
c = 0.02 (right). 

1000| . . . . 1 

800 
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assumptions (ii) and (iii) are used only to (considerably) simplify the analysis of the oracle 
policy, and hence to prove the optimality of Good-UCB. Removing assumption (ii) seems fairly 
straightforward up to the addition of another level of notations. Because of the intricate behavior 
of oracle policies in that case, it is not clear how the analysis could be carried out if assumption 
(iii) were to be removed, though it seems reasonable to assume that Good-UCB will still be 
macroscopically optimal. Good-UCB is designed to work even when assumption (i) is not 
satisfied, but the analysis is complicated by the explicit dependency between the missing mass 
of the different experts. 

One may also wonder whether it would be possible to obtain optimal rates of convergence 
(in the macroscopic limit sense) for this problem, and whether Good-UCB is optimal in that 
sense too. Finally, another macroscopic limit deserves to be investigated, where the number of 
interesting items for each arm remains constant, while N and n go to infinity. Note that in 
such Poisson regime appears. The analysis of Good-UCB might be possible by using 

a better concentration bound for the Good- Turing estimator such as the Boucheron-Massart- 
Lugosi inequality |BLM09j . This could also contribute to explain why, in the second experiment 
presented in Section O the parameter c should be chosen decreasing with time. 



8 Appendix 

Lemma 3. For all 1 < k < n, 

1 , n v-^ 1 , n 

T + logj< -<^og- (6) 



k k ^-^ j k 

j=k+i 



Proof: The standard sum/integral comparison yields 
but 
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j=k+l 



, n+l n ^ 1\ / l\,n 1 



Proof of Lemma [1} For (i), it suffices to notice that 

i=i ^ ^ + i=i 

if /N A. Moreover, the same argument shows that 

limsup^G^(/^) = G(^limsup^^ and liminf ^G^(/^) = G (^liminf , 

Hence, if G^{l^)/N converges, the limit belongs to [0,q] and thus can be written G{X) for some 
A G [0, qi]. Thus, G (^liminf ^) = G (^limsup ^) = G(A), which implies that /N ^ X as G 
a continuous bijection. 

Concerning (ii): if f^/N <p, then the fact that |G^(L^(/^)) - /^| < K implies that 
that G^{L^{f^))/N (p and thus, by (i), that L^{f^)/N converges to a value A such that 
G(A) = (j), i.e. A = L{(l)). The reciprocal (which is not used in the sequel) is left to the reader. 
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Proof of Lemma [2} Since 



it suffices to show that for every expert i € {1, . . . , i^}, D 
\og{qi/X) as goes to infinity. Write 

1 



N 



AN 



/N converges almost surely to 



Qf-i'^-l 



wi 



N 



Di 



N 



E 



Di 



N 



N 



(7) 



k=l 



For every positive integer d and for k G {1, . . . ,1^ — 1}, elementary manipulations of the 
geometric distribution yield that 



E 



< E 



S 



N 



E 



S: 



N 



^ c(d) ^ 2c{d) 



A4 



for some positive constant c(d) depending only on d, and for N large enough. Hence, taking ([7]) 
to the fourth power and developing yields 



E 



N 



< 



for some positive constant c'. Using Markov's inequality together with the Borel-Cantelli lemma, 
this permits to show that VF-^jv converges almost surely to as iV goes to infinity. But 



N 



-E 



N 



1 1 



1 Qi^ N 



with < e < 1/Z according to Lemma [3l and thus 



1e 

N 



r<N_ 



lim log I ,„ I 



which concludes the proof. 
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